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Abstract—Dezert-Smarandache Theory (DSmT) of plausible 
and paradoxical reasoning has excellent performance when the 
data contain uncertainty or conflicting. However, the methods 
developed in DSmT are in general very computationally ex- 
pensive, thus they may not be directly applied to multiple data 
sources with high cardinality. In this paper, we explore the 
feasibility of using DSmT in practical applications through a 
case study. Specifically, we propose a DSm hybrid model with 
reduced number of classes and thus low computational cost to 
analyze temperature and humidity data received from multiple 
sensors to determine comfort zones in a smart building. Data 
from each sensor is considered as individual evidence that 
can be uncertain, imprecise and even conflicting. Several types 
of combination rules are applied to calculate the total mass 
function. Then the belief, plausibility and pignistic probability 
are deduced. They are used as metrics for decision making 
to determine comfort levels of the monitored environment. 
Both simulation and real data experiments demonstrate that 
the proposed method would make DSmT feasible for practical 
situation awareness applications. 


Keywords-Dezert-Smarandache Theory (DSmT), Dempster- 
Shafer Theory (DST), Comfort Zone, Uncertain Data Fusion, 
Smart Building, Multi Sensor, Multi Hypothesis. 


I. INTRODUCTION 


In future smart buildings or smart environment, numerous 
sensors will be deployed for monitoring and surveillance. As 
a result, large amount of data will be collected from various 
sources. In many practical cases, the data may contain 
uncertainties and sometimes even are conflicting. How to 
use the data to make inference and decisions becomes a 
challenge. 

Dempster-Shafer theory [1] has been used to combine 
data (called evidence) from multiple sources. Compared to 
traditional Bayesian method, Dempster-Shafer theory has 
more flexibility in specifying ignorance and uncertainty 
in the data. When conflicts level among source of data 
become large and the refinement of frame of discernment 
is inaccessible because of the vague and imprecise nature 
of elements of frame of discernment, Dezert-Smarandache 
theory of plausible and paradoxical reasoning (DSmT) [2] 
can be applied as a powerful tool to combine the data. 
However, the methods in the DSmT framework are in 
general very computationally expensive, thus in many big 
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Figure 1. Relative humidity / temperature comfort zone (ISO7730-1984) 


data processing, they may not be directly applied to multiple 
data sources with high cardinality. 


In this paper, we explore the feasibility of using DSmT in 
practical applications through a case study. Specifically, we 
propose a modified algorithm to use DSmT with reduced 
computational cost to analyze temperature and humidity 
data received from multiple sensors to determine comfort 
zones in a smart building. Comfort zone is defined as the 
range of temperature and humidity that people are feeling 
comfortable [3]. It is known as a thermal/human comfort too. 
Evaluating comfort zone is related to different parameters 
and even different from person to person. Fig.1 shows the 
“Comfort Zone” according to ISO7730-1984 standard. It 
designed based on several experiments and a large amount 
of empirical data that collected over several years from 
different locations. As these graphs display, comfort zone 
is different for winter and summer seasons. 


In traditional buildings, the sensors are installed in some 
fixed places and they may not be able to measure locations 
of interest. In our previous work [4], we proposed a novel 
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Figure 2. The architecture of the proposed community sensing system 


framework of an environment air quality monitoring system 
based on community sensing, see Fig.2. Leveraging on the 
high penetration of smartphones and low cost and small form 
factor of certain sensors with a Bluetooth module, critical 
measurements such as air quality can be measured by each 
sensor carried by a member of a community, and be sent to 
that person’s smartphones, and eventually uploaded to server 
using a corresponding app. Then the aggregated data at the 
server side can be processed to determine comfort zone and 
control HVAC ! system to optimize the usage of electricity, 
while keeping the inhabitants comfortable. In this project, we 
have designed the architecture of the proposed community 
sensing system, and implemented the system using com- 
mercial off-the-shelf (COTS) Sensordrone [5], paired with 
Android© smartphones. Our system measures temperature, 
humidity, pressure, carbon monoxide, and battery charge 
level in real-time and it provided the experimental data in 
this study. 

In this paper, we start by introducing the details of 
Dempster-Shafer theory of evidence (DST) in Section II 
and Dezert-Smarandache theory (DSmT) of plausible and 
paradoxical reasoning in Section III. Then we propose our 
models and apply different combination rules to calculate 
total mass, belief, plausibility and pignistic probability. Fi- 
nally decision making based on those metrics are used to 
compare for different models and combination rules. Sec- 
tion IV describes data collection as our real data evidences. 
Section V explains our case studies including both synthetic 
data and real data analysis using the proposed methods with 
several types of combination rules. Section VI concludes the 


paper. 
II. DEMPSTER-SHAFER THEORY 


Dempster-Shafer theory (DST) of evidence, or DST, is 
firstly originated by Dempster’s work [6] on the upper and 
lower probabilities and later extended by Shafer’s work [1] 
on the belief functions. It is an extension of the traditional 
Bayesian probability that gives capability to deal with un- 
certainty. To better understand Dempster-Shafer theory, we 
firstly introduce some propositions [7]: 


‘Heating, ventilation, and air conditioning 
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Frame of discernment: let © be a finite set of elements. 
Elements here refer to hypothesis or classes that for our 
study are feeling zones. © called the frame of discernment. 
For Dempster-Shafer model, all elements of © are assumed 
be exclusive and exhaustive. The power set of © that 
includes all subset of © is defined by 2°. Basically power 
set includes all the elements of © and all combinations of 
their union. So it is closed under union operator. 

Mass Function: mass function or basic belief assignment 
(bba) m is defined as a probability function. It maps a 
number in [0,1] to elements of 2° in such a way that: 


m : 28 > [0,1] (1) 

m(0) = 0 (2) 
X. m(A)=1 (3) 
AC2° 


Here m(A) refers to the level of confidence in A, where 
A is a subset of 2°. In our study, mass function refers 
to degree of belief for each class of feeling. In the case 
m(A) > 0, subset A is called a focal element. For the case 
subset A includes more than one element, because we do 
not have more information about each element separately, 
related mass function m(A) cannot be decomposed to more 
mass functions for each individual element. One of the main 
differences between traditional Bayesian probability and 
Dempster-Shafer theory is the uncertainty function m(©) 
in DST: 

m(@)=1- X m(A) 


AC2° 


(4) 


Combination rule of Dempster-Shafer: In many big data 
applications, different types of data are aggregated from 
multiple sensors that may originated from multiple sources. 
Combined mass function can be calculated based on the 
Dempster-Shafer rule: 


m(A) =m, P M2 @... P Mn (5) 
(A) l: A=0 
m = DR A,=A m1(A1)m2(A2)...mn(An) 
a l 
(6) 
K= X` my(A1)m2(A2)...™mn(An) (T) 
Nk=: An=0 
1-K= X` mji(Ai)m2(A2)...mMn(An) (8) 
Nk= Ak#0 


Here K is the conflict value among all the sources of 
information. It is used as a normalization factor, K € 
(0,1). The higher value of K indicates more conflicting 
among information sources. As an example, for two sensors, 
Dempster-Shafer combinational rule is: 


(9) 
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MA) = 9 a a age mA maA) aa (10) 
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DST combination rule is associative, commutative and 
markovian. For the cases with more than two sources of 
data (called evidences in DST), DST combination rule 
can be extended by applying combination rule between 
two mass functions and then combine the result with new 
evidences and so on to compute combination for all sources 
of evidences. For DST combination we applied this method. 

Associated with mass function, the belief function is 


defined as: 
Bel(x)= X` mly) (13) 
yE2© yr 
And plausibility function calculate as: 
Pi(z)= SY m(y)=1-Bel(z) (04 


yE2° x (| yA0 


where Z is the complement set of x, = O—z. It is clear that 
PI(A) > Bel(A). Belief interval, [Bel(A), Pl(A)], refers to 
the imprecision on the true probability, when belief function 
is the lower probability and plausibility function as an upper 
probability. 
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The pignistic probability introduced by [8] is defined as: 


D m(y) 


yE2°? yA0 





|£ N yl 


bet P(x) = P 
y 


(15) 


where |x| is the cardinality of x. Pignistic probability maps 
belief to probability to make a hard decision. As a result, 
belief functions provide a pessimistic view while plausibility 
function is optimistic. Pignistic probability is a compromise. 

Reliable decision making using big data fusion is a 
challenge. Although there is not any unique metric for best 
decision making, four different metrics including total mass 
function, belief, plausibility and pignistic probability are 
tested in our simulation and experiment. 


II. DEZERT-SMARANDACHE THEORY 


Dezert-Smarandache theory of plausible and paradoxical 
reasoning (DSmT) is an extension of DST and a generalized 
version of both DST and traditional Bayesian probability. 
DSmT has better performance when the uncertainty or 
conflicts among evidences are high. In DSmT, hyper power 
set of © is defined by D®. It includes all the elements of O 
and all combinations of their union and intersection. Thus 
DSmT is closed under both union and intersection operators, 
while DST is closed under union operator only. Unlike DST, 
in DSmT we are not limited for exclusivity among elements 
of ©. It is clear that the cardinality of hyper power set 
is much more than power set. Similar to DST, in DSmT 
mass function or generalized basic belief assignment (gbba) 
is defined as a mapping m : DS — [0,1], m(Ø) = 0 
and $` ,- pe m(A) = 1. Belief, plausibility and generalized 
pignistic probability functions are defined as [2]: 


Bel(x)= XŠ, my) (16) 

Pi(z)= X m(y)=1-Bel(z) (17) 
yEDS x N y#0 

bet P(x) = Ss” Caen m(y) (18) 


yEDL ,y#0 


where | Cm (x) | is the cardinality, i.e., the number of parts 
x has in the model (Venn diagram). 

Several combination rules have been developed based on 
DSmT model [2]. Those rules can manage or redistribute 
conflict values in different ways and have different complex- 
ity of computation. There are numerous combination rules 
can be defined to redistribute conflict values among elements 
of hyper power set. Classic DSm rule of combination, hybrid 
DSm rule, and series of proportional conflict redistribution 
rules (PCR) from PCR1 to PCR6 are some of those com- 
bination rules [2]. PCR5 is one of the most accurate rules 
in managing conflict. It redistributes partial conflict values 
just between the two elements that involved in that partial 
conflict. However, comparing to other methods it is hard to 
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Figure 4. Temperature, Humidity data and Comfort zone 


implement due to high computational cost. For two sources 
of evidences: VX € D® \ {0} 


mpcrs(X) = m42(X) 
M X 2mo Y 
5 | (X) m(Y) 


YEDE, XNY=0 ae nal) 


m2(X)*m1(Y) 


" m2(X) +m1(Y) 


where mj 2 refers to conjunctive consensus: 


> 


Xı,X2E€D9,Xı N Xə=X 


m42(X) m1(X1)m2(X2) (19) 


PCRS can be applied to more than two data sources [9]. 
Fig.3 shows the flowchart of applying DSmT combination 
rule (PCR5 as an example here) from sensing data to 
decision making. 

Except the classic DSm rule of combination, all other 
combination rules based on DSmT model are non- 
associative and non-markovian. This implies that for more 
than two sources of evidences, combination rule cannot be 
applied blindly between two mass functions in repetitive 
way that we do in DST. Because these combination rules 
are non-associative and non-markovian, the order of sources 
in combination can change the result of combination. For 
calculating PCRS rule for more than two sources we adopt 
a new method introduced in [10] to conserve the asso- 
ciativity and markovian property requirement to guarantee 
the correctness of the final combination. In fact, applying 
this algorithm transfers a non-associative and non-markovian 
rule to a quasi-associative and quasi-markovian rule. 

To implement PCR5 rule of combination based on this 
algorithm for n > 3 sources, we firstly calculate conjunctive 
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rule part, m12(X ), between first two sources and transfer the 
whole conflict mass to empty or non empty set (we used 
non empty set ©) and save the result. Then we calculate 
conjunctive rule between the saved results with the third 
source. We repeat this for first n — 1 sources. Finally we 
apply PCRS rule between the conjunctive result among n—1 
sources and the last source. This algorithm has the advantage 
that the order of sources in the combination rule is no longer 
important and both associative and markovian properties are 
satisfied as well. 


IV. DATA COLLECTION 


In our experimental data collection, we used the pro- 
posed platform in Fig.2 to monitor the air quality inside 
an apartment during summer season. We put five sets of 
Sensordrones nodes and Android© smartphones with related 
apps in different parts of the apartment named rooml, 
room2, living, dining, and kitchen, respectively. For all 
five sensor nodes, sensing interval is set to five seconds. 
The sensor nodes measured temperature, humidity, pressure, 
carbon monoxide, and battery level of sensor node. In 
addition, time stamps and GPS location data are uploaded 
to the server. We only used temperature and humidity data 
for our case study. 

As an example, the monitored temperature and humidity 
data for a ten-hour sensing period including 7500 data 
samples and their mappings in comfort zone are shown in 
Fig.4. Fluctuations in temperature and humidity are caused 
by running of air conditioner (AC) periodically for cooling 
during the summer. AC was set to 77 degree Fahrenheit. AC 
was turned off for the last four hours. Then temperature and 
humidity started to increase in all places as expected. 


V. DATA ANALYSIS AND DECISION MAKING 


This section explains the details of our proposed model 
and implementation of DST and DSmT related combination 
rules based on our model. Both simulation results and 
real data analysis based on the experiments are shown to 
determine the comfort zone inside the apartment. According 
to the “Comfort Zone” in ISO7730-1984 standard [3] shown 
in Fig.l, we defined 9 classes/zones including the comfort 
zone and 8 other classes around the comfort zone. We will 
call this model the first model. Fig.5 shows the 9 classes 
for the summer season. In Fig.5, small blue square markers 
show the center of related classes and red solid lines are 
used as a boundary to distinguish between different classes. 
Table.I displays temperature and humidity values and feeling 
definition for related classes based on the first model. Thus 
the frame of discernment for feeling zone evaluation is 
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Table I 
THE PROPOSED 9 CLASSES 
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exhaustive, it satisfies the Dempster-Shafer model. It is noted 
that uncertainty is not considered in this model. 

Feeling definitions in Table.I based on Fig.5 explain the 
human feeling for the range of temperature and humidity in 
each classes. For example, class “I’ means “too cold and 
humid” and so on. Based on our proposed method in Fig.5, 
Dempster-Shafer combination rule can be applied to our data 
to compute total mass. Because in this model all 9 classes are 
singleton and exclusive, the total mass and belief functions 
are equal. 

In order to calculate the mass function, we first calculate 
the normalized Euclidean distance between measured data 
from sensors and class parameters: 


1/2 
d=) |e = = (21) 
r=1 Imaz — Toin 


Here d’ refers to the distance between sensor 7 and class j, 
Sz is sensor data and fh is the value of class j. fmax — fmin 
is used for normalization. Then for sensor node k, distance 


values for all classes can be obtained: 
Dyd a, x0") (22) 


For the small value of distance di‘, the probability that the 
class of sensor k is l; is higher. Then mass function can be 


719 


+ Sensor-2 O 


Æ Sensor-4 © Sensor-5 


O Sensor-1 Sensor-3 





Humidity (%) 














74 76 78 
Temperature (° Fahrenheit) 











oO 














T T T T T T 
23 3 
o Ora m J 
wn 
OG Oo Oo Oo 
S 24t m) 
Gal o m 
1 1 1 1 1 1 
1 2 3 4 5 6 7 8 





Time (Seconds) 


Figure 6. Mass-decision for diagonal test data set - 9 classes DST model 


calculated based on the distance values for each sensor node 
k and each class 7: 


1/d; 
DA 


Finally mass functions for each sensor node k related to all 
n classes are: 


mg(lj) = (23) 


Mk = {mx (li), Mz (le), «5 Men) } (24) 
To evaluate our proposed method, we generate several 
random test data sets and feed them as input to our 
MATLAB® program to calculate the Dempster-Shafer com- 
bination based on Fig.3. One of the test data set is shown 
in Fig.6. In Fig.6 eight set of random data are chosen that 
move diagonally from bottom left to right top along the 
time. First two sets are inside zone seven (VII), next set 
inside zone four (IV), next three sets are inside comfort 
zone and the last two sets are in zone three (III). The total 
mass function and related decision based on the maximum 
values of mass are also shown. Based on the first model, all 
nine classes are singletone and exclusive, so the mass and 
belief are equal. Although the value of plausibility is greater 
than mass with the value of final uncertainty (small value as 
an offset), the overall decision result are the same for mass, 
belief, plausibility and pignistic probability, as expected. 
Similarly we feed our experimental data to the proposed 
algorithm in Fig.3 to calculate the Dempster-Shafer combi- 
nation. Fig.7 shows the total mass and the related decision. It 
is observed that, when AC is turned on, four times it moves 
inside the comfort zone (class 5) from class 6. We calculated 
related conflict during all ten hours, and conflict values are 
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high. Thus it is clear that to overcome the effect of high 
conflict we need to apply DSmT. 


According to the Sensordrone specification document 
[11], accuracy of temperature sensors are +/ — 0.5°C' or 
+/—0.9°F and accuracy of humidity sensors are +/- 2%RH. 
Therefore measurements reported by Sensordrone sensors 
add uncertainty factor based on the accuracy range of related 
sensors. Thus we expand our first model to a more accurate 
one as in Fig.8. Dashed lines in Fig.8 are drawn around solid 
line, intersection between classes, based on + / — 0.9° F and 
+/- 2% RH measurement error for temperature and humidity 
sensors, respectively. That means each class can be extended 
from its solid line boundary to near dashed line. We call this 
proposed model the second model. We can treat this new 
proposed model in two different ways, refined Dempster- 
Shafer model or hybrid DSm model. Because original nine 
classes are not exclusive completely in the second model, it 
is not a Dempster-Shafer model any more. In fact this second 
model is a hybrid DSm model, not a free DSm model, 
because there are some exclusivity among some classes but 
not full non-exclusivity among all classes. For example, 
based on Fig.8 class one has intersection with classes 2,4 
and 5 while it is exclusive from classes 3,6,7,8 and 9. It is 
clear that class 5 is the only class that has intersection with 
all other classes. 


If we define each new decomposed area in Fig.8 as a 
new class, total 25 classes without any intersection with 
others, then we can have Dempster-Shafer model with 25 
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exclusive and exhaustive classes. In this case, Dempster- 
Shafer combination rule can be applied to calculate total 
mass functions on the new 25 classes instead of 9 classes 
before. Note that we only use this as a ground truth in 
this study. The number of decomposed area without any 
intersection with each other will grow exponentially when 
uncertainties exist, thus in reality it would prevent the use of 
DST with exclusive and exhaustive classes due to the huge 
number of the decomposed area. On the contrary, the number 
of the areas remains the same for DSm hybrid model, as 
explained later. 


Fig.9 shows the results for random test data set. It is 
clear that decision based on belief, plausibilty and pignistic 
probability are similar and outperform the decision based 
on the mass function. Fig.10 shows the total mass, belief 
functions and related decision making result for our ten 
hours data. According to Fig.10, maximum mass functions 
move between classes 18 (Intersection between class 3 and 
6 < 36 >, based on Smarandach codification [12]) and 23 
(Intersection among classes 2,3,5 and 6 < 2356 > , [12]). 
Even if we consider just maximum mass among original 
focal classes one to nine, it is clear that maximum mass 
move four times between class 6 and comfort zone. As a 
result, decision making by total mass functions do not give 
reasonable result. It seems belief, plausibility and pignistic 
probability functions are better for decision making. It is 
observed in Fig.10 that when AC turned on six times, 
decision result based on belief (Similar with plausibility and 
pignistic probability decision) six times transfer to comfort 
zone (class 5) from class 6. Hence the decision making by 
belief, plausibility and pignistic probability in this proposed 


Table II 
PROPOSED 25 CLASSES FOR SUMMER SEASON 


25 


| 89 | 14 | 


4 
69 [1245 |2356 | 45,78 | 5689 | 


C amiy OO OO OOO OO o 





O Sensor-1 -+ Sensor-2 O Sensor-3 %%€ Sensor-4 <<  Sensor-5 





Humidity (%) 














74 


76 78 
Temperature (° Fahrenheit) 
T T 


84 

















oO T 
s 207 
2 5 15} 
= p10 
S d o o o 
fi fi i I fi n| 
8 T T T T T T 
<- Ô Ò 
© 26F 
oo O O O 
N D4 O 
2 1 1 1 1 1 Ọ 
1 2 3 4 5 6 7 8 


Time (Seconds) 
Figure 9. Mass-belief decision for test data set - 25 classes DST model 


Table LI 
AVERAGE RUN-TIME 


ss 
Average Run-Time (Seconds) 0.1601211 0.8071339 4.2721231 


second model outperform the first model. Nevertheless, 
Fig.10 shows that conflict values did not decrease for the 
new model in DST mode and conflict values are even higher. 

As an alternative method, we treat Fig.8 as a DSm hybrid 
model with nine classes. They are not completely exclusive 
among all classes but they are exhaustive. We applied PCR5 
rule based on the quasi method outlined in Section III. The 
result in Fig.11 and Fig.12 show that PCRS decision is very 
accurate even if there are only nine classes in the DSm 
hybrid model. 

Table.III compares the average run time for the three 
methods discussed. It is clear that DSmT model with PCR5 
needs more computation time in this test case. However, 
DSmT model will sustain because the number of classes 
remains the same while DST model will not due to the 
exponential growth in the number of classes, as explained 
before. Thus it is expected that DSmT model with PCRS5 
would be appropriate for big data processing with large 
number of classes or high cardinality. Furthermore, DSmT 
model with PCR5 outperforms DST model with the same 
number of classes by a big margin. Define Pp as the 
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Figure 10. Decision making based on mass and belief for 25 classes DST 


Table IV 
PROBABILITY OF DETECTION AND FALSE ALARM 


Model Pp 


— Dsm æsa 0 | 


Pr 


detection probability, 1.e., correct detection of comfort zone 
(Pr = Pr(H,|H;,)). Similarly Pp is define as the probabil- 
ity of wrong decision (Pp = Pr(H,|Ho)). Using DST 25 
classes model as a ground truth, we calculate Pp and Pr 
for DST 9 classes and DSmT model. Table.[V shows that 
DSmT model has much higher Pp = 96.31% comparing to 
DST 9 model Pp = 26.58%. 


VI. CONCLUSIONS 


The feasibility of using Dezert-Smarandache Theory 
(DSmT) for big data processing is explored in this paper. 
The methods in DSmT such as PCRS have very high com- 
putational complexity, thus they cannot be directly applied to 
multiple big data sources with high cardinality. We propose 
a DSm hybrid model with reduced number of classes and 
thus low computational cost and evaluate its performance 
through a case study. Specifically, the proposed method is 
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Figure 11. PCRS5 decision based on test data for 9 classes DSm model 


applied to analyze temperature and humidity data for smart 
building applications. Our results show that the proposed 
DSm hybrid model will sustain because the number of 
classes remains low while DST model will not due to the 
exponential growth in the number of classes. Comparing to 
DST with the same number of classes, DSmT has much 
better performance when the data contain high level of 
uncertainty. The results using real data sets demonstrate the 
potentials of the proposed method for big data processing 
when the data sets contain high level of uncertainty. 
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